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Abstract 

A  methodology  based  on  the  concept  of  variable  string  length  GA 
(VGA)  is  developed  for  determining  automatically  the  number  of  hyper¬ 
planes  for  modeling  the  class  boundaries  in  G A- classifier.  The  genetic 
operators  and  fitness  function  are  newly  defined  to  take  care  of  the  vari¬ 
ability  in  chromosome  length.  It  is  proved  that  the  said  method  is  able 
to  arrive  at  the  optimal  number  of  misclassifications  after  sufficiently  large 
number  of  iteratioas,  and  will  need  minimal  number  of  hyperplanes  for  this 
purpose.  Experimental  results  on  different  artificial  and  real  life  data  sets 
demonstrate  that  the  classifier,  using  the  concept  of  variable  length  chromo¬ 
some,  can  automatically  evolve  an  appropriate  value  ofH,  and  also  provide 
performance  better  than  these  of  the  fixed  length  version.  Its  comparison 
with  another  approach  using  VGA  is  provided. 

Keywords  :  Genetic  algorithms,  optimum  hyperplane  fitting,  speech  recogni¬ 
tion,  variable  string  length. 
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1  Introduction 


Genetic  Algorithms  (GAs)  (Goldberg,  1989)  are  randomized  search  and  opti¬ 
mization  techniques  guided  by  the  principles  of  evolution  and  natural  genetics. 
They  are  efficient,  adaptive  and  robust  search  processes,  producing  near  op¬ 
timal  solutions  and  have  a  large  amount  of  implicit  parallelism.  Application 
of  GA  to  various  pattern  recognition  problems  is  described  in  (Pal  and  Wang, 
1996,  Gelsema,  1995).  One  such  application  for  designing  a  classifier  is  pro¬ 
vided  in  (Bandyopadhyay  et.  al.,  1995)  where  the  searching  capability  of  GA 
is  exploited  for  the  placement  of  a  number  of  hyperplanes,  say  H ^  for  approxi¬ 
mating  the  decision  boundaries.  The  method  involves  encoding  the  parameters 
of  the  hjqierplanes  in  binary  strings  called  cfiroTnosomes^  in  the  feature  space 
that  yields  minimum  misclassification.  It  was  demonstrated  in  (Bandyopad¬ 
hyay  et,  al.,  1995)  that  the  GA  based  classifier,  subsequently  referred  to  as  the 
GA^ classifier^  can  be  well  applied  to  a  variety  of  data  sets  having  both  non¬ 
overlapping,  non-convex,  and  overlapping  classes.  Its  recognition  scores  were 
found  to  be  comparable  to,  sometimes  better  than,  those  of  k-NN  rule  (for  differ¬ 
ent  values  of  k),  Bayes  maximum  likelihood  classifier  and  multilayer  perceptron 
based  classifier. 

Note  that  estimation  of  a  proper  value  of  H  is  crucial  for  a  good  performance 
of  the  algorithm.  Since  this  is  difficult  to  achieve,  one  may  frequently  use  a 
conservative  value  of  H  while  designing  the  classifier.  This  first  of  all  leads 
to  the  problem  of  an  overdependence  of  the  algorithm  on  the  training  data, 
especially  for  small  sample  size.  In  other  words,  since  a  large  number  of  hyper¬ 
planes  can  readily  and  closely  fit  the  classes,  this  may  provide  good  performance 
during  training  but  poor  generalization  capability.  Secondly,  a  large  value  of 
H  unnecessarily  increases  the  computational  effort,  and  may  lead  to  the  pres¬ 
ence  of  redundant  hyperplanes  in  the  final  decision  boundary.  (A  hyperplane  is 
termed  redundant  if  its  removal  has  no  effect  on  the  classification  capability  of 
the  GA^classifier,) 

In  order  to  overcome  these  limitations  a  method  has  been  described  here  to  auto¬ 
matically  evolve  the  value  of  as  a  parameter  of  the  problem.  For  this  purpose. 
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the  concept  of  variable  length  strings  in  GA  has  been  adopted.  Unlike  the  con¬ 
ventional  GA,  here  the  length  of  a  string  is  not  fixed.  Crossover  and  mutation 
operators  are  accordingly  defined.  A  factor  has  been  incorporated  into  the  fit¬ 
ness  function  that  rewards  a  string  with  smaller  number  of  misclassified  samples 
as  well  as  smaller  number  of  hyperplanes.  Let  the  classifier  so  designed  utilizing 
the  concept  of  variable  string  lengths  be  called  VGA- classifier.  Issues  of  min¬ 
imum  misclassification  error  and  minimum  number  of  required  hyperplanes  are 
theoretically  analyzed  under  limiting  conditions. 

One  may  note  the  difference  between  the  proposed  classification  method  and  the 
one  described  in  (Srikanth  et  al.,  1995),  also  using  the  similar  concept  of  variable 
length  strings.  In  the  latter  method,  the  decision  boundary  was  modeled  by 
a  variable  number  of  ellipsoids  which  have  a  higher  degree  of  complexity  than 
hyperplanes.  The  fitness  function  of  the  string  was  determined  from  the  number 
of  misclassified  samples  only.  Thus  there  weis  no  incentive  for  reducing  the 
number  of  ellipsoids  although  a  factor  favouring  more  compact,  ellipsoids  weis 
introduced. 

The  experimental  results  on  speech  data.  Iris  data  and  two  artificially  generated 
data  sets  show  that  the  proposed  classifier  is  able  to  reduce  the  number  of  hyper¬ 
planes  significantly  while  retaining  the  classification  performance  of  the  previous 
fixed  length  G A- classifier.  A  comparison  with  the  classifier  implemented  using 
the  operators  of  Srikanth  et  al.  (1995)  is  also  provided. 


2  Genetic  Algorithm  with  Variable  String 
Length  and  the  Classification  Criteria 


The  concept  of  variable  string  lengths  in  genetic  algorithms  has  been  used  earlier 
in  (Smith  1980)  to  encode  sets  of  fixed  length  rules.  Messy  genetic  algorithm 
(Goldberg  et  al.,  1989)  also  uses  the  concept  of  variable  string  lengths  for  con¬ 
structing  the  chromosomes  which  may  be  under  or  over  specified.  Use  of  GA 
with  variable  string  length  has  been  made  in  (Harp  and  Samad,  1992)  for  encod¬ 
ing  variable  number  of  fixed  length  blocks  in  order  to  construct  layers  of  a  neural 
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network,  and  in  (Maniezzo,  1994)  for  the  genetic  evolution  of  the  topology  and 
weight  distribution  of  neural  networks. 

As  mentioned  in  Section  1,  the  G A- classifier  (Bandyopadhyay  et  al.,  1995) 
with  fixed  H,  and  consequently  fixed  string  length  is  rigid,  and  therefore  has 
several  limitations  like  overfitting  of  the  training  data  and  presence  of  redundant 
hyperplanes  in  the  decision  boundary  when  a  conservative  value  of  H  is  used. 
To  overcome  these  limitations,  the  use  of  variable  length  strings  representing 
variable  number  of  hyperplanes  for  modeling  optimally  the  decision  boundary 
therefore  seems  natural  and  appropriate.  This  would  eliminate  the  need  for  fix¬ 
ing  the  value  of  H,  evolving  it  adaptively  instead;  thereby  providing  an  optimal 
value  of  H. 

It  is  to  be  noted  that  in  the  process,  if  we  aim  at  reducing  the  number  of  mis- 
classified  points  only,  as  was  the  case  for  fixed  length  strings,  then  the  algorithm 
may  try  to  fit  as  many  hyperplanes  as  possible  for  this  purpose.  This,  in  turn, 
would  obviously  be  harmful  with  respect  to  the  generalization  capability  of  the 
classifier.  Thus  the  fitness  function  should  be  defined  in  such  a  way,  maximiza¬ 
tion  of  which  ensures  primarily  the  minimization  of  the  number  of  misclassified 
samples  and  also  the  requisite  number  of  hyperplanes. 

While  incorporating  the  concept  of  variable  string  lengths,  one  may  note  that 
it  is  necessary  to  either  modify  the  existing  genetic  operators  or  introduce  new 
ones.  In  order  to  utilize  the  existing  operators  as  much  as  possible,  a  new 
representation  scheme  involving  the  consideration  of  the  ternary  alphabet  set 
{0,  1,  #  },  where  #  represents  the  don’t  care  position,  is  used.  For  applying  the 
conventional  crossover  operator,  the  two  strings,  which  may  now  be  of  unequal 
lengths,  can  be  made  of  equal  length  by  appropriately  padding  one  of  them 
with  #s.  However,  some  extra  processing  steps  have  to  be  defined  in  order 
to  tackle  the  presence  of  #s  in  the  strings.  Similarly,  the  mutation  operator 
needs  to  be  suitably  modified  such  that  it  has  sufficient  flexibility  to  change  the 
string  length  while  retaining  the  flavour  of  the  conventional  operator.  (As  will 
be  evident  in  the  next  section,  the  genetic  operators  are  defined  in  such  a  way 
that  the  inclusion  of  #  in  the  strings  does  not  affect  their  binary  characteristics 
for  encoding  and  decoding  purposes.)  The  classifier  thus  formed  using  variable 
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string  length  GA  (or  VGA)  is  referred  to  as  the  VGA- classifier. 

Therefore  the  objective  of  the  VGA- classifier  is  to  place  an  appropriate  number 
of  hyperplanes  in  the  feature  space  such  that  it,  first  of  all,  minimizes  the  number 
of  misclassified  samples  and  then  attempts  to  reduce  the  number  of  hyperplancs. 
Using  variable  length  strings  enables  one  to  check  automatically  and  efficiently, 
various  decision  boundaries  consisting  of  different  number  of  hyperplanes  in 
order  to  attain  the  said  criterion.  The  description  of  such  a  classifier  is  given  in 
the  next  section. 


3  Description  of  VGA- classifier 


As  evident  from  the  previous  section,  although  the  sequence  of  the  different 
operations  for  GA  (as  shown  in  Fig.  1)  is  applicable  to  VGA  too,  the  operators 
themselves  are  newly  defined  for  VGA.  They  are  described  here. 

3.1  Chromosome  Representation  and  Population  Initial¬ 
ization 


The  chromosomes  are  represented  by  strings  of  1,  0  and  #  (don’t  care),  encoding 
the  parameters  of  variable  number  of  hyperplanes.  In  'Rfi,  N  parameters  are 
required  for  representing  one  hyperplane.  These  are  N  —  1  angle  variables, 
angle\, . . . ,  angle'ff_^,  indicating  the  orientation  of  hyperplane  i  (i  =  1, 2, . . . ,  if 
when  H  hyperplanes  are  encoded  in  the  chromosome),  and  one  perpendicular 
distance  variable,  p*  indicating  its  perpendicular  distance  from  the  origin.  Let 
Hmax  represent  the  maximum  number  of  hyperplanes  that  may  be  required  to 
model  the  decision  boundary  of  a  given  data  set.  It  is  specified  a  priori.  Let 
the  angle  and  perpendicular  distance  veiriables  be  represented  by  bi  and  62  bits 
respectively.  Then  Ijj,  the  number  of  bits  required  to  represent  a  hyperplane 
and  Imax,  the  maximum  length  that  a  string  can  have  are 


Ih  =  {N  —  1)  *bi  +  62 

Imax  —  Hfnax  *  Ih 


(1) 

(2) 
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respectively. 

Let  string  i  represent  Hi  hyperplanes.  Then  its  length  Zj  is 

li  =  Hi*  lj{. 

Initial  population  is  created  in  such  a  way  that  the  first  and  the  second  strings 
encode  the  parameters  of  Hmax  and  1  hyperplanes  respectively  to  ensure  suf¬ 
ficient  diversity  in  the  population.  For  the  remaining  strings,  the  number  of 
hyperplanes,  iTj,  is  generated  randomly  in  the  range  [1,  Hmax],  and  the  li  bits 
are  initialized  randomly  to  Is  and  Os. 

3,2  Fitness  Computation 

As  mentioned  in  Section  2,  the  fitness  function  (which  is  maximized)  is  defined 
in  such  a  way  that 

i  :  a  string  with  smaller  value  of  misclassifications  is  considered  to  be  fitter 

than  a  string  with  a  larger  value,  irrespective  of  the  number  of  hjrperplanes 
i.e.,  it  first  of  all  minimizes  the  number  of  misclassified  points,  and  then 

ii  :  among  two  strings  providing  the  same  number  of  misclassifications,  the 

one  with  the  smaller  number  of  hyperplanes  is  considered  to  be  fitter. 

The  number  of  misclassified  points  for  a  string  i  encoding  Hi  hyperplanes  is 
found  as  follows  ;  Let  the  Hi  hyperplanes  provide  Mi  distinct  regions  which 
contain  at  least  one  training  data  point.  (Note  that  although  Mi  <  2^',  in  reality 
it  is  upper  bounded  by  the  size  of  the  training  data  set.)  For  each  such  region 
and  firom  the  training  data  points  that  lie  in  this  region,  the  class  of  the  majority 
is  determined,  and  the  region  is  considered  to  represent  (or  be  labeled  by)  the 
smd  class.  Points  of  other  classes  that  lie  in  this  region  are  considered  to  be 
misclassified.  The  sum  of  the  misclassifications  for  all  the  Mi  regions  constitutes 
the  total  misclassification  missi  associated  with  the  string.  Accordingly,  the 
fitness  of  string  i  may  be  defined  as 

fiti  =  (n  -  missi)  -  aHi  \  <  Hi  <  Hmax  (3) 

=  0,  otherwise,  (4) 
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where  n  =  size  of  the  training  data  set  and  a  =  77^. 

^max 

Let  us  now  explain  how  the  first  criterion  is  satisfied;  Let  two  strings  i  and  j 
have  number  of  misclassifications  missi  and  missj  respectively,  and  number  of 
hyperplanes  encoded  in  them  be  Hi  and  Hj  respectively.  Let  missi  <  misSj 
and  Hi  >  Hj.  (Note  that  since  the  number  of  misclassified  points  can  only  be 
integers,  missj  >  missi  +  !•)  Then, 

fiti  =  (n  -  misSi)  —  aHi, 
fitj  =  (n  -  missj)  —  aHj. 

The  aim  now  is  to  prove  that  fiti  >  fitj,  or  that  fiti  -  fitj  >  0.  From  the 
above  equations, 

fiti  —  fitj  =  misSj  —  misSi  —  a{Hi  —  Hj). 

If  Hj  =  0,  then  fitj  =  0  (from  Eq.  4)  and  therefore  fiti  >  fitj.  When  1  < 
^  Hmax,  we  have  oc{Hi  —  Hj)  <  1  since  {Hi  —  Hj)  <  Hmax-  Obviously, 
misSj  -  misSi  >  1.  Therefore  M  -  fitj  >  0,  or,  fiti  >  fitj. 

The  second  criterion  is  also  fulfilled  since  fiti  <  fitj  when  missi  =  missj  and 
Hi  >  Hj. 

3.3  Genetic  Operators 

Among  the  operations  of  selection,  crossover  and  mutation,  the  selection  oper¬ 
ation  used  here  may  be  one  of  those  used  in  conventional  GA,  while  crossover 

and  mutation  need  to  be  newly  defined  for  VGA.  These  are  now  described  in 
detail. 

Crossover  :  Two  strings,  i  and  j,  having  lengths  Zj  and  Ij  respectively  are 
selected  from  the  mating  pool.  Let  Zj  <  Ij,  Then  string  i  is  padded  with  ^s 
so  as  to  make  the  two  lengths  equal.  Conventional  crossover  like  single  point 
crossover,  two  point  crossover  (Goldberg,  1989)  is  now  performed  over  these  two 
strings  with  probability  fj,^.  The  following  two  cases  may  now  arise  : 
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•  All  the  hyperplanes  in  the  offspring  are  complete.  (A  hyperplane  in  a 
string  is  called  complete  if  all  the  bits  corresponding  to  it  are  either  defined 
(i.e.,  Os  and  Is)  or  #s.  Otherwise  it  is  incomplete.) 

•  Some  hyperplanes  are  incomplete. 

In  the  second  case  let  u  =  number  of  defined  bits  (either  0  or  1)  and  t  =  total 
number  of  bits  per  hyperplane  =  (AT  —  1)  *  6i  +  62  (from  Eq.  1).  Then,  for  each 
incomplete  hyperplane,  all  the  #s  are  set  to  defined  bits  (either  0  or  1  randomly) 
with  probability  j.  In  case  this  is  not  permitted,  all  the  defined  bits  are  set  to 
Thus  each  hyperplane  in  the  string  becomes  complete.  Subsequently,  the  string 
is  rearranged  so  that  all  the  #s  are  pushed  to  the  end,  or  in  other  words  all 
the  hyperplanes  are  transposed  to  the  beginning  of  the  strings.  The  information 
about  the  number  of  hyperplanes  in  the  strings  is  updated  accordingly. 

Mutation  :  In  order  to  introduce  greater  flexibility  in  the  method,  the  mutation 
operator  is  defined  in  such  a  way  that  it  can  both  increase  and  decrease  the 
string  length.  For  this,  the  strings  are  padded  with  #s  such  that  the  resultant 
length  becomes  equal  to  Imax-  Now  for  each  defined  bit  position,  it  is  determined 
whether  conventional  mutation  (Goldberg,  1989)  can  be  applied  or  not  with 
probability  Pm-  Otherwise,  the  position  is  set  to  #  with  probability  p^r  Each 
undefined  position  is  set  to  a  defined  bit  (randomly  chosen)  according  to  another 
mutation  probability  p^.^^  These  are  described  in  Fig.  2. 

Note  that  mutation  may  result  in  some  incomplete  hyperplanes,  and  these  are 
handled  in  a  manner,  as  done  for  crossover  operation.  For  example,  the  oper¬ 
ation  on  the  defined  bits,  i.e.,  when  k  <  li  in  Fig.  2,  may  result  in  a  decrease 
in  the  string  length,  while  the  operation  on  #s,  i.e.,  when  k  >  li  in  the  figure, 
may  result  in  an  increase  in  the  string  length.  Also,  mutation  may  yield  strings 
having  all  #s  indicating  that  no  hyperplanes  are  encoded  in  it.  Consequently, 
this  string  will  have  fitness  =  0  and  will  be  automatically  eliminated  during 
selection. 

Note  that  the  operations  defined  here  for  designing  the  VGA -classifier  are 
different  from  those  used  in  (Smith  1980;  Goldberg  et  al.,  1989;  Harp  and  Samad, 
1992;  Maniezzo  1994,  Srikanth  et  al.,  1995). 
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As  in  conventional  GAs,  the  operations  of  selection,  crossover  and  mutation  are 
performed  here  over  a  number  of  generations  till  a  user  specified  termination 
condition  is  attained.  Elitism  is  incorporated  such  that  the  best  string  seen 
upto  the  current  generations  is  preserved  in  the  population.  The  best  string  of 
the  last  generation,  thus  obtained,  along  with  its  associated  labeling  of  regions 
provides  the  classification  boundary  of  the  n  training  samples.  After  the  design 
is  complete,  the  task  of  the  classifier  is  to  check,  for  an  unknown  pattern,  the 
region  in  which  it  lies,  and  to  put  the  label  accordingly. 


4  Issues  of  Minimum  miss  and  H 


In  this  section  we  prove  that  the  above  mentioned  VGA- classifier  will  provide 
the  minimal  misclzissification  error  during  training,  for  infinitely  large  number 
of  iterations.  At  the  same  time  it  will  require  minimum  number  of  hyperplanes 
in  doing  so. 

For  proving  this  we  use  the  result  of  (Bhandari  et  al.,  1996),  where  it  has  been 
established  that  for  an  infinitely  large  nmnber  of  iterations,  an  elitist  model  of 
GA  will  surely  provide  the  optimal  string.  In  order  to  prove  this  convergence 
they  assumed  that  the  probability  of  going  from  any  string  to  the  optimal  one 
is  always  greater  than  zero,  and  the  probability  of  going  from  a  population 
containing  the  optimal  string  to  one  not  containing  the  optimal  one  is  zero. 
Since  the  mutation  operation  and  elitism  of  the  proposed  VGA  ensure  that  both 
these  conditions  are  met,  the  result  of  (Bhandari  et  al.,  1996),  regarding  the 
convergence  to  the  optimal  string  is  valid  for  VGA  as  well. 

Let  us  now  consider  the  fitness  function  for  string  i  (Eq.  3).  Maximization  of 
the  fitness  function  means  minimization  of 

missi  -h  aHi  =  ervi,  say 

where  a  =  Let  us  call  this  the  error  function  (ervi). 
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Let  for  any  size  of  the  training  data  set  (n),  the  minimum  value  of  the  error 
function  as  obtained  by  the  VGA- classifier  be 

=  raiss^  +  aH° 

after  it  has  been  executed  for  infinitely  large  number  of  iterations.  Then  accord¬ 
ing  to  Bhandari  et  al.  (1996),  this  corresponds  to  the  optimal  string.  Therefore 
we  may  write 

miss°  -f-  aH°  <  miss  +  aH,  V  miss,  H.  (5) 

Theorem  1  :  For  any  value  oi  H,  1  <  H  <  Hmaxi  the  minimal  number  of 
misclassified  points  is  miss°. 

Proof  :  The  proof  is  trivial  and  follows  firom  the  definition  of  the  fitness 
function  (Eq.  3)  and  the  fact  that  miss°  -1-  aH°  <  miss  -}-  aH,  V  miss,  H 
(Eq.  5). 

Theorem  2  :  H°  is  the  minimal  number  of  hyperplanes  required  for  providing 
miss°  number  of  misclassified  points. 

Proof  :  Let  the  converse  be  true,  i.e.,  there  exists  some  H',  H'  <  H°,  that 
provides  miss°  number  of  misclassified  points.  In  that  case,  the  corresponding 
fitness  value  would  be  miss^  aH' .  Note  that  now  miss''  +  aH"  >  miss"  ^aH' . 
This  violates  Eq.  5.  Hence  H'  -ft.  H",  and  therefore  H"  is  the  minimal  number 
of  hyperplanes  required  for  providing  miss"  misclassified  points. 

From  Theorems  1  and  2,  it  is  proved  that  for  any  value  of  n,  the  VGA- classifier 
provides  the  minimum  nuniber  of  misclassified  points  for  infinitely  large  number 
of  iterations,  and  it  requires  minimum  number  of  hyperplanes  in  doing  so. 

5  Implementation  and  Results 

The  experimental  investigation  presented  in  this  section  has  two  parts.  In  the 
first  part,  the  effectiveness  of  VGA  in  automatically  determining  the  value  of  H 
of  the  classifier  is  demonstrated  for  two  sets  of  artificial  data,  a  speech  data  and 
Iris  data.  The  recognition  scores  of  the  VGA- classifier  are  also  compared  with 
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those  of  the  fixed  length  G A- classifier.  Secondly,  we  compare  our  concept  of 
using  variable  string  lengths  in  GA  with  another  similar  approach  (Srikanth  et 
al.,  1995).  For  this  purpose  we  have  implemented  their  different  operators  in 
our  classification  algorithm  for  the  above  mentioned  four  data  sets. 

The  2-dimensional  artificial  data  sets,  ADS  1  (Fig.  3)  and  ADS  2  (Fig.  4), 
consist  of  557  and  417  points  respectively  belonging  to  two  classes.  The  real  life 
speech  data.  Vowel  (Pal  and  Majumdar,  1977),  consists  of  871  samples  having 
three  feature  values  (corresponding  to  the  three  formant  frequencies)  and  six 
classes  {6,a,i,u,e,o}.  Fig.  5  shows  the  overlapping  class  structures  in  the  first 
and  second  formant  frequency  plane.  Iris  data  comprises  150  samples  having 
four  features  and  three  classes  with  50  points  in  each  class. 

A  fixed  population  size  of  20  is  chosen.  Roulette  wheel  strategy  (Goldberg, 
1989)  is  used  to  implement  proportional  selection.  As  in  an  earlier  investigation 
(Bandyopadhyay  et  al.,  1995),  single  point  crossover  is  applied  with  a  fixed 
crossover  probability  of  0.8.  A  variable  value  of  mutation  probability  Pm  is 
selected  from  the  range  [0.01,  0.333).  Initially  it  assumes  a  high  value,  gradually 
decreasing  at  first,  and  then  increasing  again  in  the  later  stages  of  the  algorithm. 
200  iterations  are  performed  with  each  mutation  probability  value.  The  values  of 
Pmi  and  mentioned  in  Section  3.3  are  set  to  0.1.  The  process  is  executed  for  a 
maximum  3000  iterations.  Elitism  is  incorporated  by  replacing  the  worst  string 
of  the  present  generation  by  the  best  string  seen  upto  the  previous  generation. 

I 

Performance  of  the  VGA- classifier 

Tables  1  and  2  show  the  number  of  hyperplanes  Hvga  as  determined  automati¬ 
cally  by  the  VGA-classifier  for  modeling  the  class  boundaries  of  the  aforesaid 
four  data  sets  when  the  classifier  is  trained  with  10%  and  50%  samples  respec¬ 
tively.  Two  different  values  of  Hmax  are  used  for  this  purpose  viz.,  Hmax  —  6  and 
Hmax  =  10-  The  overall  recognition  scores  obtained  during  testing  of  the  VGA- 
classifier  along  with  their  comparison  with  those  obtained  for  the  fixed  length 
version  (i.e.,  G A- classifier)  with  H  =  6  and  10  are  also  shown.  (Note  that 
H  =  6  had  been  found  to  provide,  on  an  average,  good  recognition  scores  in  ear¬ 
lier  experiments  (Bandyopadhyay  et  al.,  1995)  with  these  data  sets.)  The  scores 
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provided  are  the  average  values  obtained  over  5  different  runs  of  the  algorithms. 

Table  1:  Hvga  and  the  comparative  overall  recognition  scores  (%)  during  testing 
(when  10%  of  the  data  set  is  used  for  training  and  the  remaining  90%  for  testing) 


Data  set 

VGA- classifier 

Hmax  ~  10 

Score  for 
GA-classifier 
H=10 

VGA-classifier 

Hmax  ~  6 

Score  for 
GA-classifier 
H  =  6 

Hvga 

Score 

Hvga 

Score 

ADS  1 

3 

95.62 

84.26 

4 

96.21 

93.22 

ADS  2 

6 

88.16 

84.04 

5 

88.35 

88.29 

Vowel 

6 

73.66 

69.21 

6 

71.19 

71.99 

Iris 

2 

95.56 

76.29 

2 

95.81 

93.33 

Table  2:  Hvga  and  the  comparative  overall  recognition  scores  (%)  during  testing 
(when  50%  of  the  data  set  is  used  for  training  and  the  remaining  50%  for  testing) 


Data  set 

VGA-classifier 

Hmax  ~  10 

Score  for 
GA-classifier 

fr  =  10 

VGA-classifier 

Hmax  ~  6 

Score  for 
GA-classifier 
H  =  6 

Hvga 

Score 

Hvga 

Score 

ADS  1 

4 

96.41 

95.92 

4 

96.83 

96.05 

ADS  2 

5 

95.22 

94.56 

3 

96.26 

96.17 

Vowel 

6 

78.26 

77.77 

6 

77.11 

76.68 

Iris 

2 

97.60 

93.33 

2 

97.67 

97.33 

The  results  demonstrate  that  in  all  the  ceises,  the  VGA- classifier  is  able  to 
evolve  an  appropriate  value  of  Hvga  from  Hmax-  In  addition,  its  recognition 
score  on  the  test  data  set  is  found,  on  an  average,  to  be  higher  than  that  of  the 
G A- classifier.  There  is  only  one  exception  to  this  for  the  Vowel  data  when  10% 
of  the  samples  is  used  for  training  (Table  2).  In  this  case,  Hmax  =  6  does  not 
appear  to  be  a  high  enough  value  for  modeling  the  decision  boundaries  of  Vowel 
classes  with  VGA- classifier.  This  is  reflected  in  both  the  tables,  where  the 
scores  for  VGA-classifier  with  Hmax  =  6  are  less  than  those  with  Hmax  =  10. 
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In  all  the  cases  where  the  number  of  hyperplanes  for  modeling  the  class  bound¬ 
aries  is  less  than  6,  the  scores  of  VGA- classifier  with  Hmax  =  6  are  found  to  be 
superior  to  those  with  Hmax  =  10.  This  is  so  because  with  Hmax  =  10,  the  search 
space  is  larger  as  compared  to  that  for  Hmax  =  6,  which  makes  it  difficult  for  the 
classifier  to  arrive  at  the  optimum  arrangement  quickly  or  within  the  maYimnTr. 
number  of  iterations  considered  here.  (Note  that  it  may  have  been  possible  to 
further  improve  the  scores  and  also  reduce  the  number  of  hyperplanes,  if  more 
iterations  of  VGA  were  executed.) 

In  general,  the  scores  of  the  GA-classifier  (fixed  length  version)  with  H  =  10 
are  seen  to  be  lower  than  those  with  H  =  6  because  of  two  reasons;  overfitting  of 
the  training  data  and  difficulty  of  searching  a  larger  space.  The  only  exception 
is  with  Vowel  for  training  with  50%  data  where  the  score  for  if  =  10  is  larger 
than  that  for  H  =  6.  This  is  expected,  in  view  of  the  overlapping  classes  of 
the  data  set  and  the  significantly  large  size  of  the  training  data.  One  must  note 
in  this  context  that  the  detrimental  effect  of  overfitting  on  the  generalization 
performance  increases  with  decrease  in  the  size  of  the  training  data. 

As  an  illustration,  the  decision  boundary  obtained  by  the  VGA-classifier  for 
ADS  1  when  10%  of  the  data  set  is  chosen  for  training  is  shown  in  Fig.  3. 

Comparison  with  the  Method  in  (Srikanth  et  al.,  1995) 

In  this  section  an  investigation  is  made  to  compare  the  performance  of  our  con¬ 
cept  of  using  variable  string  length  in  GA  with  that  of  another  similar  approach 
(Srikanth  et  al.,  1995).  For  this  purpose  the  operators  used  in  (Srikanth  et  al., 
1995)  are  implemented  here  for  the  same  problem  of  pattern  classification  using 
hyperplanes,  and  the  resulting  performance  is  compared  to  those  of  our  VGA- 
classifier  for  the  four  data  sets.  Before  providing  the  results,  let  us  describe  in 
brief  the  method  of  incorporating  variable  string  lengths  in  GAs  as  proposed  in 
(Srikanth  et  al.,  1995). 

The  initial  population  is  created  randomly  such  that  each  string  encodes  the 
parameters  of  only  one  hyperplane.  The  fitness  of  a  string  is  characterized  by  just 
the  number  of  training  points  it  classifies  correctly,  irrespective  of  the  number 
of  hyperplanes  encoded  in  it.  Among  the  genetic  operators,  traditional  selection 
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and  mutation  are  used.  A  new  form  of  crossover,  called  modulo  crossover  is 
used  which  keeps  the  sum  of  the  lengths  of  the  two  chromosomes  constant  both 
before  and  after  crossover. 

Two  other  operators  are  used  in  conjunction  with  the  modulo  crossover  for  the 
purpose  of  faster  recombination  and  juxtaposition.  These  are  the  insertion  and 
deletion  operators.  During  insertion,  a  portion  of  the  genetic  material  from  one 
chromosome  is  inserted  at  a  random  insert-location  in  the  other  chi'omosome. 
Conversely,  during  deletion,  a  portion  of  a  chromosome  is  deleted  to  result  in  a 
shorter  chromosome. 

Tables  3  and  4  show  the  comparative  overall  recognition  scores  during  both 
training  and  testing  of  the  VGA- classifier  for  the  above  mentioned  four  data 
sets  when  our  approach  of  incorporating  variable  string  length  is  compared  with 
that  adopted  in  (Srikanth  et  al.,  1995)  for  10%  and  50%  training  data  respec¬ 
tively.  Other  parameters  are  kept  the  same  as  before.  Results  shown  are  the 
average  values  taken  over  five  different  rims.  For  keeping  parity,  the  VGA  of 
Srikanth  et  al.  is  implemented  such  that  no  more  than  10  hyperplanes  are  used 
for  modeling  the  decision  boundary  of  the  data  sets.  The  table  also  shows  the 
number  of  hyperplanes,  Hvga^  generated  by  the  two  methods  for  one  particular 
run.  Since  the  VGA  of  Srikanth  et  al.  does  not  take  care  of  the  minimization 
of  the  number  of  hyperplanes  while  maximizing  the  fitness  function,  the  Hvga 
is  usually  higher  than  that  of  our  method. 

As  is  evident  from  the  tables,  the  performance  of  the  classifier  during  training 
is  better  for  the  VGA  of  (Srikanth  et  al.)  than  the  proposed  one  for  all  the 
data  sets.  The  former,  in  general,  uses  more  h3q)erplanes  (of  which  many  were 
found  to  be  redundant  on  investigation),  which  results  in  an  increase  in  the 
execution  time.  From  the  training  performance,  it  appears  that  the  operators 
used  by  Srikanth  et  al.,  are  better  able  to  recombine  the  subsolution  blocks  into 
larger  blocks.  However  this  is  seen,  in  general,  to  result  in  comparatively  poorer 
scores  during  testing.  To  consider  a  typical  example  in  one  of  the  cases  for  the 
Vowel  data  set  when  10%  data  is  used  for  training,  10  hyperplanes  were  used 
to  provide  a  training  recognition  score  of  97.47%,  while  the  recognition  score 
during  testing  fell  to  68.95%. 
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It  is  also  found  that  with  increase  in  the  size  of  the  training  data,  the  number 
of  hyperplanes  for  modeling  the  class  boundaries  increase  for  the  algorithm  of 
Srikanth  et  al.  Furthermore,  as  expected,  the  performance  of  all  the  classifiers 
is  improved  with  increase  in  the  size  of  the  training  data  from  10%  to  50%. 

Table  3:  Comparative  classification  performance  of  VGA- classifier  for  ifmai=10 
using  two  types  of  variable  string  lengths  (when  10%  of  the  data  set  is  used  for 
training  and  the  remaining  90%  for  testing) 


Data  set 

Pro 

posed  VGA 

VGA  (Srikanth  et  al.) 

Training 
score  (%) 

Test 

score  (%) 

Hvga 

Training 
score  (%) 

Test 

score  (%) 

Hvga 

ADS  1 

100 

95.62 

3 

100 

93.16 

6 

ADS  2 

92.68 

88.16 

6 

99.10 

90.50 

6 

6 

97.36 

70.22 

9 

Iris 

100 

95.56 

2 

100 

94.98 

2 

Table  4:  Comparative  classification  performance  of  VGA- classifier  for  Hmax=^^ 
using  two  types  of  variable  string  lengths  (when  50%  of  the  data  set  is  used  for 
training  and  the  remaining  50%  for  testing) 


Data  set 

Pro 

posed  VGA 

VGA  (Srikanth  et  al.) 

Training 
score  (%) 

Test 

score  (%) 

Hvga 

Training 
score  (%) 

Test 

score  (%) 

Hvga 

ADS  1 

98.18 

96.41 

4 

100.00 

96.01 

9 

ADS  2 

97.21 

95.22 

5 

100.00 

94.85 

7 

Vowel 

79.73 

78.26 

6 

85.48 

78.37 

9 

Iris 

100 

97.60 

2 

100.00 

94.67 

5 

6  Conclusions 

The  problem  of  fixing  the  appropriate  value  oi  H  a  priori  of  the  GA-classifier 
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(Bandyopadhyay  et  al.,  1995)  has  been  resolved  by  using  the  concept  of  variable 
string  lengths  in  genetic  algorithm.  New  genetic  operators  are  defined  to  deal 
with  the  concept  of  variable  string  lengths  for  formulating  the  classifier.  The  fit¬ 
ness  function  has  been  defined  so  that  its  maximization  indicates  minimization 
of  the  number  of  misclassified  samples  as  well  eis  the  required  number  of  hyper¬ 
planes.  It  is  proved  that  for  infinitely  large  number  of  iterations  the  method 
is  able  to  arrive  at  the  optimal  number  of  misclassified  samples  and  will  need 
optimal  number  of  hyperplanes  for  this  purpose. 

Experimental  evidence  for  different  percentages  of  training  and  test  data  indi¬ 
cates  that  given  a  value  of  Hmax,  the  algorithm  can  not  only  be  able  to  auto¬ 
matically  evolve  an  appropriate  value  of  H  for  a  given  data  set,  but  also  result 
in  improved  performance  of  the  classifier.  The  method  of  using  variable  string 
length  in  the  algorithm  of  Srikanth  et  al.  is  also  implemented  in  our  VGA- 
classifier  for  comparison.  Since  the  former  method  does  not  include  a  factor 
for  reducing  the  number  of  surfaces,  it  is  found  to  use  more  hyperplanes  for 
constituting  the  decision  boundary.  This  results  in  better  training  performance, 
mostly  at  the  cost  of  reduced  generalization  capability.  Additionally,  the  execu¬ 
tion  time  is  also  more  since  no  explicit  effort  is  made  to  decrease  the  number  of 
hyperplanes. 

In  this  connection  one  may  also  note  that  the  genetic  operators  and  processing 
steps  of  the  VGA  described  in  this  article  entail  very  little  disruption  of  those 
in  the  conventional  GA.  On  the  other  hand  this  is  not  true  for  the  method  of 
Srikanth  et  al.  which  introduces  two  new  processing  steps  viz.,  insertion  and 
deletion,  besides  using  a  significantly  different  crossover  operator.  Further,  the 
former  method  requires  the  specification  of  H^naxy  whereas  such  a  constraint  is 
not  required  for  the  latter  one. 
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Figure  Captions 

Fig.  1  -  Basic  steps  in  GA. 

Fig-  2  -  Mutation  operation  for  string  i. 

Fig.  3  -  ADS  1  along  with  VGA  boundary  for  =  10  when  10%  of  the  data 
set  is  used  for  training. 

Fig.  4  -  ADS  2. 

Fig.  5  -  Vowel  Data  in  the  F\  -  F2  plane. 
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Begin 


t=0 

initialize  population  P(t) 
compute  fitness  P(t) 
repeat 

t  =  t+1 

select  P(t)  from  P(t-l) 
crossover  P(t) 
mutate  P(t) 
compute  fitness  P(t) 
until  termination  criterion  is  achieved 

End 


Figure  1;  Basic  steps  in  GA 


Begin 

k  =  length  of  string  i 

Pad  string  i  with  #  so  that  its  length  becomes  l^ax 
for  A:  =  1  to  do 

Generate  rnd,  rndl  and  rnd2  randomly  in  [0,1] 
if  k  <li  do  /*  defined  bits  */ 

if  rnd  <  do  /*  Conventional  mutation  * / 
flip  bit  k  of  string  i 
else  /*  try  changing  to  ^  */ 
if  rndl  <  do 

Set  bit  A:  of  string  i  to  ^ 

endif 


endif 

else  /*.k  >  k  i.e.,  #  */ 

if  rnd2  <  do  /*  Set  to  defined  */ 

Position  k  of  string  i  set  to  0  or  1  randomly 

endif 


endif 

endfor 

End 


Figure  2;  Mutation  operation  for  string  i 
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